Take-home Exercise 1

Published

January 30, 2023

1 Introduction

In this take home exercise, we are tasked to apply appropriate spatial point pattern analysus methods to discover the geographical distribution of functional and non-functional waterpoints and their colocations if any in Osun state, Nigeria.

The main tasks of this exercise includes

  • Exploratory Spatial Data Analaysis

  • Second-order Spatial Point Pattern Analysis

  • Spatial Correlation Analysis

1.1 Installing and Loading R packages

We first start off by loading the necessary R packages into our platform.

pacman::p_load(maptools, sf, raster, spatstat, tmap, tidyverse, funModeling, sfdep)

1.2 Bringing data into platform

1.2.1 Geospatial data

This study will focus of Osun State, Nigeria. The state boundary GIS data of Nigeria can be downloaded from geoBoundaries.

1.2.1.1 Reading geospatial data

Within the geoboundaries data, we choose to use ADM2 data given that we want to investigate the distribution of water pumps within the LGAs in Osun.

geoNGA2 <- st_read(dsn = "data/geospatial", layer = "nga_admbnda_adm2_osgof_20190417") |>  st_transform(crs = 26392) |> 
  arrange(ADM2_EN)
Reading layer `nga_admbnda_adm2_osgof_20190417' from data source 
  `/Users/pengyouyun/youyunpeng/IS415/Take-home_Ex/Take-home_Ex01/data/geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 774 features and 16 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2.668534 ymin: 4.273007 xmax: 14.67882 ymax: 13.89442
Geodetic CRS:  WGS 84
#check for duplicates
geoNGA2$ADM2_EN[duplicated(geoNGA2$ADM2_EN) == TRUE]
[1] "Bassa"    "Ifelodun" "Irepodun" "Nasarawa" "Obi"      "Surulere"
duplicated_LGA <- geoNGA2$ADM2_EN[duplicated(geoNGA2$ADM2_EN) == TRUE]
# Get all the indices with names that are included in the duplicated LGA names
duplicated_indices <- which(geoNGA2$ADM2_EN %in% duplicated_LGA)

geoNGA2$ADM2_EN[94] <- "Bassa, Kogi"
geoNGA2$ADM2_EN[95] <- "Bassa, Plateau"
geoNGA2$ADM2_EN[304] <- "Ifelodun, Kwara"
geoNGA2$ADM2_EN[305] <- "Ifelodun, Osun"
geoNGA2$ADM2_EN[355] <- "Irepodun, Kwara"
geoNGA2$ADM2_EN[356] <- "Irepodun, Osun"
geoNGA2$ADM2_EN[519] <- "Nasarawa, Kano"
geoNGA2$ADM2_EN[520] <- "Nasarawa, Nasarawa"
geoNGA2$ADM2_EN[546] <- "Obi, Benue"
geoNGA2$ADM2_EN[547] <- "Obi, Nasarawa"
geoNGA2$ADM2_EN[693] <- "Surulere, Lagos"
geoNGA2$ADM2_EN[694] <- "Surulere, Oyo"

The code chunk below filters out geoNGA2 into Osun state Local Government Area

osun_LGA <- c("Aiyedade","Aiyedire","Atakumosa East",   "Atakumosa West",   
              "Ede North",  "Ede South",    "Egbedore", "Ejigbo",   "Ife Central",  
              "Ife East",   "Ife North",    "Ife South",    "Ifedayo",  "Ila",
              "Ifelodun, Osun","Irepodun, Osun","Ilesha East",  "Ilesha West",
              "Irewole",    "Isokan",   "Iwo",  "Obokun",   "Odo-Otin", "Ola-oluwa",    
              "Olorunda",   "Oriade",   "Orolu",    "Osogbo", "Boripe", "Boluwaduro")

bd <- geoNGA2 |> 
  filter(ADM2_EN %in% osun_LGA) #create border sf that filters out Osun LGAs

qtm(bd) #checking if the border data is correctly filtered

1.2.2 Aspatial data

For the purpose of this assignment, data from WPdx Global Data Repositories will be used.

1.2.2.1 Reading aspatial data

Again, in the waterpoint data can be narrowed down to only osun state.

wp_osun <- read_csv("data/aspatial/WPdx.csv") %>%
  filter(`#clean_country_name` == "Nigeria", `#clean_adm1`=="Osun")

With our packages and data in place, we can now start with our analysis!

2 Exploratory Spatial Data Analysis

In deriving kernel density maps of functional and non-functional water points, we need to under-go two main processes, namely data conversion from sf to ppp for spatstat, and using spatstat functions to create a KDE object. We first start with data conversion from sf to ppp.

2.1 Data conversion from sf to ppp

2.1.1 Converting water point data into sf point features

First we need to convert the wkt field into sfc field by using st_as_sfc() data type. Next we will convert the tibble data.frame into an sf object by isomh st_sf(). it is also important for us to include the referencing system of the data into the sf object. In this case, it has the CRS of WGS 84, so we set the crs to EPSG code 4326.

wp_osun$Geometry = st_as_sfc(wp_osun$`New Georeferenced Column`)
wp_osun
# A tibble: 5,557 × 71
   row_id `#source`      #lat_…¹ #lon_…² #repo…³ #stat…⁴ #wate…⁵ #wate…⁶ #wate…⁷
    <dbl> <chr>            <dbl>   <dbl> <chr>   <chr>   <chr>   <chr>   <chr>  
 1 429123 GRID3             8.02    5.06 08/29/… Unknown <NA>    <NA>    Tapsta…
 2  70566 Federal Minis…    7.32    4.79 05/11/… No      Protec… Well    Mechan…
 3  70578 Federal Minis…    7.76    4.56 05/11/… No      Boreho… Well    Mechan…
 4  66401 Federal Minis…    8.03    4.64 04/30/… No      Boreho… Well    Mechan…
 5 422190 GRID3             7.87    4.88 08/29/… Unknown <NA>    <NA>    Tapsta…
 6 422064 GRID3             7.7     4.89 08/29/… Unknown <NA>    <NA>    Tapsta…
 7  65607 Federal Minis…    7.89    4.71 05/12/… No      Boreho… Well    Mechan…
 8  68989 Federal Minis…    7.51    4.27 05/07/… No      Boreho… Well    <NA>   
 9  67708 Federal Minis…    7.48    4.35 04/29/… Yes     Boreho… Well    Mechan…
10  66419 Federal Minis…    7.63    4.50 05/08/… Yes     Boreho… Well    Hand P…
# … with 5,547 more rows, 62 more variables: `#water_tech_category` <chr>,
#   `#facility_type` <chr>, `#clean_country_name` <chr>, `#clean_adm1` <chr>,
#   `#clean_adm2` <chr>, `#clean_adm3` <chr>, `#clean_adm4` <chr>,
#   `#install_year` <dbl>, `#installer` <chr>, `#rehab_year` <lgl>,
#   `#rehabilitator` <lgl>, `#management_clean` <chr>, `#status_clean` <chr>,
#   `#pay` <chr>, `#fecal_coliform_presence` <chr>,
#   `#fecal_coliform_value` <dbl>, `#subjective_quality` <chr>, …
wp_sf <-  st_sf(wp_osun, crs=4326) #convert to sf, tell R what crs used for projection 
wp_sf
Simple feature collection with 5557 features and 70 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 4.032004 ymin: 7.060309 xmax: 5.06 ymax: 8.061898
Geodetic CRS:  WGS 84
# A tibble: 5,557 × 71
   row_id `#source`      #lat_…¹ #lon_…² #repo…³ #stat…⁴ #wate…⁵ #wate…⁶ #wate…⁷
 *  <dbl> <chr>            <dbl>   <dbl> <chr>   <chr>   <chr>   <chr>   <chr>  
 1 429123 GRID3             8.02    5.06 08/29/… Unknown <NA>    <NA>    Tapsta…
 2  70566 Federal Minis…    7.32    4.79 05/11/… No      Protec… Well    Mechan…
 3  70578 Federal Minis…    7.76    4.56 05/11/… No      Boreho… Well    Mechan…
 4  66401 Federal Minis…    8.03    4.64 04/30/… No      Boreho… Well    Mechan…
 5 422190 GRID3             7.87    4.88 08/29/… Unknown <NA>    <NA>    Tapsta…
 6 422064 GRID3             7.7     4.89 08/29/… Unknown <NA>    <NA>    Tapsta…
 7  65607 Federal Minis…    7.89    4.71 05/12/… No      Boreho… Well    Mechan…
 8  68989 Federal Minis…    7.51    4.27 05/07/… No      Boreho… Well    <NA>   
 9  67708 Federal Minis…    7.48    4.35 04/29/… Yes     Boreho… Well    Mechan…
10  66419 Federal Minis…    7.63    4.50 05/08/… Yes     Boreho… Well    Hand P…
# … with 5,547 more rows, 62 more variables: `#water_tech_category` <chr>,
#   `#facility_type` <chr>, `#clean_country_name` <chr>, `#clean_adm1` <chr>,
#   `#clean_adm2` <chr>, `#clean_adm3` <chr>, `#clean_adm4` <chr>,
#   `#install_year` <dbl>, `#installer` <chr>, `#rehab_year` <lgl>,
#   `#rehabilitator` <lgl>, `#management_clean` <chr>, `#status_clean` <chr>,
#   `#pay` <chr>, `#fecal_coliform_presence` <chr>,
#   `#fecal_coliform_value` <dbl>, `#subjective_quality` <chr>, …

However, we want to convert the coordinate reference system to Nigeria’s projected coordinate system. We use st_transform(), and include the EPSG code for Nigeria’s projected coordinate system: 26392.

wp_sf <- wp_sf %>%
  st_transform(crs = 26392)

qtm(wp_sf) #quick view

2.1.2 Data wrangling for waterpoint data

In cleaning the waterpoint data, we first rename the column from #status_clean to status_clean for easier handling in subsequent steps. select() of dplyr is used to include status_clean in the output sf data.frame. - mutate() and replace_na() are used to recode all the NA values in status_clean into unknown.

wp_sf_nga <- wp_sf |> 
  rename(status_clean = '#status_clean') |> 
  select(status_clean) |> 
  mutate(status_clean = replace_na(
    status_clean, "unknown"
  ))

2.1.3 Extracting water point data

Now we are ready to extract the water point data according to their status.

The code chunk below is used to extract functional water points.

wp_functional <- wp_sf_nga |> 
  filter(status_clean %in%
           c("Functional",
             "Functional but not in use",
             "Functional but needs repair"))

The code chunk below is used to extract nonfunctional waterpoint.

wp_nonfunctional <- wp_sf_nga |> 
  filter(status_clean %in% 
           c("Abandoned/Decommissioned",
             "Abandoned",
             "Non-Functional due to dry season",
             "Non-Functional",
             "Non functional due to dry season"))

The code chunk below is used to extract water point with unknown status.

wp_unknown <- wp_sf_nga |> 
  filter(status_clean %in% 
           c("unknown"))

Next, the code chunk below is used to perform a quick EDA on the derived sf data.frames.

freq(data = wp_functional,
     input = 'status_clean')

                 status_clean frequency percentage cumulative_perc
1                  Functional      2319      88.17           88.17
2 Functional but needs repair       248       9.43           97.60
3   Functional but not in use        63       2.40          100.00
freq(data = wp_nonfunctional,
     input = 'status_clean')

                      status_clean frequency percentage cumulative_perc
1                   Non-Functional      2008      92.15           92.15
2 Non-Functional due to dry season       151       6.93           99.08
3                        Abandoned        15       0.69           99.77
4         Abandoned/Decommissioned         5       0.23          100.00

2.2 Data Wrangling to Prepare Data for Spatstat

2.2.1 Converting sf data frames to sp’s Spatial Class

The code chunk below uses as_Spatial(). of sf package to convert the three geospatial data from simple feature data frame to sp’s Spatial* class.

NGA_bd_sc<-as(bd, "Spatial")
NGA_bd_sc
class       : SpatialPolygonsDataFrame 
features    : 30 
extent      : 176503.2, 291043.8, 331434.7, 454520.1  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs 
variables   : 16
names       :    Shape_Leng,       Shape_Area,  ADM2_EN, ADM2_PCODE, ADM2_REF, ADM2ALT1EN, ADM2ALT2EN, ADM1_EN, ADM1_PCODE, ADM0_EN, ADM0_PCODE,  date, validOn, validTo,        SD_EN, ... 
min values  : 0.26445678806, 0.00248649736648, Aiyedade,   NG030001, Aiyedade,         NA,         NA,    Osun,      NG030, Nigeria,         NG, 17134,   18003,      NA, Osun Central, ... 
max values  :  1.8470166597,  0.0737271661922,   Osogbo,   NG030030,   Osogbo,         NA,         NA,    Osun,      NG030, Nigeria,         NG, 17134,   18003,      NA,    Osun West, ... 
func<-as(wp_functional, "Spatial")
func
class       : SpatialPointsDataFrame 
features    : 2630 
extent      : 177285.9, 290751, 343128.1, 450859.7  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs 
variables   : 1
names       :              status_clean 
min values  :                Functional 
max values  : Functional but not in use 
nonfunc<-as(wp_nonfunctional, "Spatial")
nonfunc
class       : SpatialPointsDataFrame 
features    : 2179 
extent      : 180539, 290616, 340054.1, 450780.1  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs 
variables   : 1
names       :                     status_clean 
min values  :                        Abandoned 
max values  : Non-Functional due to dry season 

2.2.2 Converting the Spatial* class into generic sp format

spatstat requires the analytical data in ppp object form. There is no direct way to convert a Spatial* classes into ppp object. We need to convert the Spatial classes* into Spatial object first.

The codes chunk below converts the Spatial* classes into generic sp objects.

NGA_bd_sp <- as(NGA_bd_sc, "SpatialPolygons")

funcsp<-as(func, "SpatialPoints")
nonfuncsp<-as(nonfunc, "SpatialPoints")

2.2.3 Convert generic sp format into spatstat’s ppp format

funcppp<-as(funcsp, "ppp")
nonfuncppp<-as(nonfuncsp,"ppp")

2.2.4 Checking for Duplicated Points

We can check for the duplication in a ppp object by using the code chunk below. We see that there are no duplicated points.

any(duplicated(funcppp))
[1] FALSE
any(duplicated(nonfuncppp))
[1] FALSE

2.2.5 Creating Owin Object

To confine our analysis with the geographical area Osun, we create an object called owin in spatstat to represent the polygonal region.

The code chunk below is used to convert NGA_bd_sp into owin object of spatstat.

NGA_owin<- as(NGA_bd_sp, "owin")

2.2.6 Combine point events object and owin object

In this last step of geospatial data wrangling, we will extract the waterpoints that are located within Osun by using the code chunk below.

#funcppp
funcppp1<-funcppp[NGA_owin]
summary(funcppp1)
Planar point pattern:  2529 points
Average intensity 2.928471e-07 points per square unit

Coordinates are given to 2 decimal places
i.e. rounded to the nearest multiple of 0.01 units

Window: polygonal boundary
30 separate polygons (no holes)
            vertices      area relative.area
polygon 1        204 766084000       0.08870
polygon 2         81 304399000       0.03520
polygon 3         97 465688000       0.05390
polygon 4        124 373051000       0.04320
polygon 5         60 149473000       0.01730
polygon 6         84 144820000       0.01680
polygon 7         50 102243000       0.01180
polygon 8         72 216002000       0.02500
polygon 9        112 269897000       0.03130
polygon 10       125 365142000       0.04230
polygon 11        83 111191000       0.01290
polygon 12       126 192557000       0.02230
polygon 13       219 904397000       0.10500
polygon 14       174 741131000       0.08580
polygon 15        81 138742000       0.01610
polygon 16        65 119452000       0.01380
polygon 17        90 280205000       0.03240
polygon 18        69  69814600       0.00808
polygon 19        69  42727500       0.00495
polygon 20        49  30458800       0.00353
polygon 21        62 263505000       0.03050
polygon 22        93 438930000       0.05080
polygon 23        87 274127000       0.03170
polygon 24       105 509979000       0.05910
polygon 25        98 292058000       0.03380
polygon 26        64 327765000       0.03800
polygon 27       133 108945000       0.01260
polygon 28       122 462169000       0.05350
polygon 29        94 109715000       0.01270
polygon 30        95  61239800       0.00709
enclosing rectangle: [176503.22, 291043.82] x [331434.7, 454520.1] units
                     (114500 x 123100 units)
Window area = 8635910000 square units
Fraction of frame area: 0.613
plot(funcppp1)

#nonfuncppp
nonfuncppp1<-nonfuncppp[NGA_owin]
summary(nonfuncppp1)
Planar point pattern:  2059 points
Average intensity 2.384232e-07 points per square unit

Coordinates are given to 2 decimal places
i.e. rounded to the nearest multiple of 0.01 units

Window: polygonal boundary
30 separate polygons (no holes)
            vertices      area relative.area
polygon 1        204 766084000       0.08870
polygon 2         81 304399000       0.03520
polygon 3         97 465688000       0.05390
polygon 4        124 373051000       0.04320
polygon 5         60 149473000       0.01730
polygon 6         84 144820000       0.01680
polygon 7         50 102243000       0.01180
polygon 8         72 216002000       0.02500
polygon 9        112 269897000       0.03130
polygon 10       125 365142000       0.04230
polygon 11        83 111191000       0.01290
polygon 12       126 192557000       0.02230
polygon 13       219 904397000       0.10500
polygon 14       174 741131000       0.08580
polygon 15        81 138742000       0.01610
polygon 16        65 119452000       0.01380
polygon 17        90 280205000       0.03240
polygon 18        69  69814600       0.00808
polygon 19        69  42727500       0.00495
polygon 20        49  30458800       0.00353
polygon 21        62 263505000       0.03050
polygon 22        93 438930000       0.05080
polygon 23        87 274127000       0.03170
polygon 24       105 509979000       0.05910
polygon 25        98 292058000       0.03380
polygon 26        64 327765000       0.03800
polygon 27       133 108945000       0.01260
polygon 28       122 462169000       0.05350
polygon 29        94 109715000       0.01270
polygon 30        95  61239800       0.00709
enclosing rectangle: [176503.22, 291043.82] x [331434.7, 454520.1] units
                     (114500 x 123100 units)
Window area = 8635910000 square units
Fraction of frame area: 0.613
plot(nonfuncppp1)

2.2.7 Rescale ppp data into km

In the code chunk below, rescale() is used to convert the unit of measurement from meter to kilometer

funcppp1.km<-rescale(funcppp1, 1000, "km")
nonfuncppp1.km<-rescale(nonfuncppp1, 1000, "km")

With our data cleaned and in thr right format, we can move on to compute the Kernel Density Estimation!

2.3 First-order Spatial Point Pattern Analysis

2.3.1 Computing kernel density Estimation using adaptive bandwidth selection method

In spatial analysis, the choice of bandwidth is important for determining the smoothness of a surface fit to the data. The bandwidth determines the width of the smoothing kernel used in spatial smoothing techniques like kernel density estimation or local regression. We compare between two types of bandwidth methods, fixed and adaptive bandwidth, to be applied to this situation.

Fixed bandwidth methods use a constant value for the bandwidth throughout the analysis, regardless of the distribution of the data. This can be useful when the data has a consistent structure, but if the data is highly variable or has multiple modes, a constant bandwidth may not provide an adequate fit to the data.

Adaptive bandwidth methods, on the other hand, use a variable bandwidth that adjusts based on the local structure of the data. This allows for more flexibility in the fitting process.

From our initial plots, we can see that there is some evidence of clustering, leading to our choice of using adaptive bandwidth in our analysis.

funckde_adaptive<- adaptive.density(funcppp1.km, method="kernel")

nonfunckde_adaptive<- adaptive.density(nonfuncppp1.km, method="kernel")

par(mfrow=c(1,2))
plot(funckde_adaptive, main="Functional Waterpoint KDE using Adaptive Bandwidth")
plot(nonfunckde_adaptive, main="Non-Functional Waterpoint KDE using Adaptive Bandwidth")

2.3.2 Convert KDE into grid object

Converting KDE into a gridded object that is suitable for mapping purposes.

gridded_kde_funckde <- as.SpatialGridDataFrame.im(funckde_adaptive)

gridded_kde_nonfunckde <- as.SpatialGridDataFrame.im(nonfunckde_adaptive)

2.3.3 Convert gridded output into raster

Next we convert the gridded kernal density objects into RasterLayer objet using raster() of raster package.

funckde_raster<-raster(gridded_kde_funckde)
funckde_raster
class      : RasterLayer 
dimensions : 128, 128, 16384  (nrow, ncol, ncell)
resolution : 0.8948485, 0.9616045  (x, y)
extent     : 176.5032, 291.0438, 331.4347, 454.5201  (xmin, xmax, ymin, ymax)
crs        : NA 
source     : memory
names      : v 
values     : 1.320603e-16, 23.9989  (min, max)
nonfunckde_raster<-raster(gridded_kde_nonfunckde)
nonfunckde_raster
class      : RasterLayer 
dimensions : 128, 128, 16384  (nrow, ncol, ncell)
resolution : 0.8948485, 0.9616045  (x, y)
extent     : 176.5032, 291.0438, 331.4347, 454.5201  (xmin, xmax, ymin, ymax)
crs        : NA 
source     : memory
names      : v 
values     : 6.85125e-17, 20.92404  (min, max)

Notice that the crs is NA

2.3.4 Assigning Projection Systems

The code chunk below is used to include the CRS information funckde_raster and nonfunckde_raster.

projection(funckde_raster) <- CRS("+init=EPSG:26392 +units=km")
funckde_raster
class      : RasterLayer 
dimensions : 128, 128, 16384  (nrow, ncol, ncell)
resolution : 0.8948485, 0.9616045  (x, y)
extent     : 176.5032, 291.0438, 331.4347, 454.5201  (xmin, xmax, ymin, ymax)
crs        : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +units=km +no_defs 
source     : memory
names      : v 
values     : 1.320603e-16, 23.9989  (min, max)
res(funckde_raster)
[1] 0.8948485 0.9616045
projection(nonfunckde_raster) <- CRS("+init=EPSG:26392 +units=km")
nonfunckde_raster
class      : RasterLayer 
dimensions : 128, 128, 16384  (nrow, ncol, ncell)
resolution : 0.8948485, 0.9616045  (x, y)
extent     : 176.5032, 291.0438, 331.4347, 454.5201  (xmin, xmax, ymin, ymax)
crs        : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +units=km +no_defs 
source     : memory
names      : v 
values     : 6.85125e-17, 20.92404  (min, max)
res(nonfunckde_raster)
[1] 0.8948485 0.9616045

2.3.5 Viewing object in tmap

Finally, we will display the raster in cartographic quality map using tmap package.

tmap_mode("view")
tm_basemap(server ="OpenStreetMap")+ 
tm_shape(funckde_raster) + 
  tm_raster("v") +
  tm_layout(main.title = "Raster Plot of Functional Waterpoint KDE",
            main.title.position = "center",
            main.title.size = 1.2,
            legend.height = 0.45, 
            legend.width = 0.35,
            frame = TRUE)
tmap_mode("view")
tm_basemap(server ="OpenStreetMap")+ 
tm_shape(nonfunckde_raster) + 
  tm_raster("v") +
  tm_layout(main.title = "Raster Plot of Non-Functional Waterpoint KDE",
            main.title.position = "center",
            main.title.size = 1.2,
            legend.height = 0.45, 
            legend.width = 0.35,
            frame = TRUE)

Change tmap mode to plot.

tmap_mode("plot")

3 Second-order Spatial Point Patterns Analysis

For the purposes of Spatial Point Analysis, we are attempting to determine a pattern in terms of either clustering or standardization of point spread. However, in order to do so, we must first reject the null hypothesis that they are randomly distributed to arrive at any conclusion. Thus:

The test hypotheses are:

  • Ho = The distribution of Functional/Non Functional Waterpoints are randomly distributed.

  • H 1= The distribution of Functional/Non Functional Waterpoints services are not randomly distributed.

We will set a 95% confidence interval for the purpose of this study.

3.0.1 Analysing Spatial Point Process Using K-Function

  1. Computing K-function estimate In our analysis, we choose to use the K function illustrates how the spatial clustering or dispersion of feature centroid changes when the neighborhood size changes.

Referencing this website, when the observed K value is larger than the expected K value for a particular distance, the distribution is more clustered than a random distribution at that distance. When the observed K value is smaller than the expected K value, the distribution is more dispersed than a random distribution at that distance.

K_Fun <- Kest(funcppp1, correction = "Ripley")

K_NonFun <- Kest(nonfuncppp1, correction = "Ripley")

par(mfrow=c(1,2))
plot(K_Fun, .-r ~r, ylab="K(d)-r", xlab ="d(m)")
plot(K_NonFun, .-r ~r, ylab="K(d)-r", xlab ="d(m)")

  1. Generating Monte Carlo test with K-Function
K_Fun.csr <- envelope(funcppp1, Kest, nsim = 39, rank = 1, glocal = TRUE)

K_NonFun.csr <- envelope(nonfuncppp1, Kest, nsim = 39, rank = 1, glocal = TRUE)
  1. plot

To interpret the K function simulation results, we plot out the simulations as a confidence interval, looking at the observed spatial pattern, expected spatial pattern and confidence interval.

summary(K_Fun.csr)
summary(K_NonFun.csr)

par(mfrow=c(1,2))
plot(K_Fun.csr, main="Functional Waterpoint K Simulations")
plot(K_NonFun.csr, main="Non-functional Waterpoint K Simulations")

From the graph above, we see that the observed line falls above that of the confidence interval for both functional and non-functional waterpoints. We reject the null hypothesis at the 95% level of significance that the distribution of functional and non functional waterpoints are randomly distributed. From the information gathered earlier, we see that given the observed K values are higher than expected, it is likely that the functional water points are clustered around each other. A similar conclusion can be drawn for non-functional waterpoints.

3.0.2 Performing Clark-Evans test

Referencing this website, we want to use the Clark-Evans test to reinforce our conclusions. The Clark and Evans (1954) aggregation index R is a crude measure of clustering or ordering of a point pattern. It is the ratio of the observed mean nearest neighbour distance in the pattern to that expected for a Poisson point process of the same intensity. A value R>1 suggests ordering, while R<1 suggests clustering.

clarkevans.test(funcppp1,
                correction="none",
                clipregion="NGA_owin",
                alternative=c("two.sided"),
                nsim=39)

    Clark-Evans test
    No edge correction
    Monte Carlo test based on 39 simulations of CSR with fixed n

data:  funcppp1
R = 0.44265, p-value = 0.05
alternative hypothesis: two-sided

3.0.3 Interpretation of the Clark-Evans test

Based on the Clark Evans test with 39 simulations, we find that the index R value is 0.44265, which is less than 1, indicating evidence of clustering. However, as the p-value is 0.05, we are unable to reject the null hypothesis.

3.0.4 Statistical conclusion

To conclude, there is evidence of clustering of the functional and non-functional water points using the K-function and the Clark and Evans test. However, the evidence could be weak. A more definitive answer could be obtained if we had tried to use more simulations in the test.

4 Spatial Correlation Analysis

In this section, we seek to confirm statistically if the spatial distribution of functional and non-functional waterpoints are independent from each other.

4.1 Data Wrangling

We first start with a plot of the distribution of the waterpoints using wp_sf_nga and bd objects defined earlier.

tmap_mode("view")
tm_shape(bd)+
  tm_polygons()+
  tm_shape(wp_sf_nga)+
  tm_dots(col="status_clean")
#streamlining data into functional and non-functional waterpoints only

We realise that under the column “status_clean”, there are too many categories, which can be difficult for interpretation especially when we want to calculate colocation quotients. We will conduct data wrangling in the next step to define the waterpoints distinctly into “functional” and “non-functional”.

nonfunc_df<-wp_sf_nga |> 
filter(status_clean %in% 
           c("Abandoned/Decommissioned",
             "Abandoned",
             "Non-Functional due to dry season",
             "Non-Functional",
             "Non functional due to dry season")) |> 
  mutate(status_redefined="Non-functional")

func_df<-wp_sf_nga |> 
filter(status_clean %in%
           c("Functional",
             "Functional but not in use",
             "Functional but needs repair")) |> 
  mutate(status_redefined="Functional")

#creating a status_redefined column that states if water point is either functional or non functional

df<-bind_rows(func_df, nonfunc_df) |> 
  mutate(status_redefined=factor(status_redefined)) |> 
  select(Geometry, status_redefined)
#combining data frames into 1 df

With our new dataframe, we continue to plot a graph showing the the functional and non-functional water points using tmap.

tmap_mode("view")
tm_shape(bd)+
  tm_polygons()+
  tm_shape(df)+
  tm_dots(col="status_redefined",
          size=0.01,
          border.col="black",
           border.lwd=0.5)
#plotting the combined dataframe

4.2 Local Colocation coefficient

According to this website, the colocation analysis tool measures local patterns of spatial association between two categories of point features using the colocation quotient statistic.

Each feature in the Category of Interest (category A) is evaluated individually for colocation with the presence of the Neighboring Category (category B) found within its neighborhood. In general, if the proportion of B points within the neighborhood of A is more than the global proportion of B, the colocation quotient will be high. If the neighborhood of A contains many other A points or many other categories other than B, the colocation between the Category of Interest (category A) and the Neighboring Category (category B) will be small.

In our analysis, our category of interest (A) is functional waterpoints, and neighboring category (B) is non-functional waterpoints.

4.2.1 Preparing the vector list

FW<-df |> 
  filter(status_redefined == "Functional")
A<- FW$status_redefined

NFW<-df |> 
  filter(status_redefined == "Non-functional")
B<- NFW$status_redefined

4.2.2 Preparing nearest neighbour list

In the code chunk below, st_knn() of sfdep package is used to determine the k (i.e. 6) nearest neighbours for given point geometry.

nb<-include_self(
  st_knn(st_geometry(df), 6)
)

4.2.3 Computing Kernel Weights

In the code chunk below, st_kernel_weights() of sfdep package is used to derive a weights list by using a kernel function.

wt<-st_kernel_weights(nb,
                      df,
                      "gaussian",
                      adaptive=TRUE)

4.2.4 Computing LCLQ

In the code chunk below local_colocation() us used to compute the LCLQ values for each Water point event.

LCLQ<-local_colocation(A, B, nb, wt, 39)

4.2.5 Joining output Table

Before we can plot the LCLQ values their p-values, we need to join the output of local_colocation() to the stores sf data.frame. However, a quick check of LCLQ data-frame, we can’t find any field can be used as the join field. As a result, cbind() of Base R is useed.

LCLQ_WP<-cbind(df,LCLQ)

4.2.6 Plotting LCLQ values

In the code chunk below, tmap functions are used to plot the LCLQ analysis.

#plot the graph
tmap_mode("view")
tm_shape(bd) +
  tm_polygons() + 
tm_shape(LCLQ_WP) +
  tm_dots(col="Non.functional")+
  tm_view(set.zoom.limits = c(9,13))+
tm_shape(LCLQ_WP) +
  tm_dots(col="p_sim_Non.functional")+
  tm_view(set.zoom.limits = c(9,13))

4.2.7 Statistical conclusion

From the statistical table, we see that the colocation coefficient is less than 1 but extremely close to one for some points.

From this website, features that have colocation quotients less than one are less likely to have category B within their neighborhood. If a feature has a colocation quotient equal to one, it means the proportion of categories within their neighborhood is a good representation of the proportion of categories throughout the entire study area.

Therefore, it is likely that there is some correlation between location of functional and non functional water points. Additionally given that the p-value is less than 0.05 for the selected points, we can say that the result is statistically significant, and that Functional and non-functional water points are dependent with each other.

tmap_mode("plot")

4.3 Performing appropriate tests using second order spatial point pattern analysis technique

We will use the Cross-K Function to look into this relationship

The test hypotheses are:

  • Ho = The distribution of functional and non-functional waterpoints are spatially independent from each other (ie randomly distributed).

  • H 1= The ditribution of functional and non-functional waterpoints are not independent from each other (ie not randomly distributed).

We will set a 95% confidence interval for the purpose of this study.

4.3.1 Conversion of LCLQ data into ppp

In this analysis, we seek to perform marked point pattern analysis, based on the associated categorical measurement “status_redefined” in the waterpoint data.

df_spatialpoint<-df |>
  as("Spatial") |> 
  as("SpatialPointsDataFrame") #creating spatial point data frame from sf

df_spatialpoint@data$status_redefined<-as.factor(df_spatialpoint@data$status_redefined) #creating a factor column for status_redefined

df_ppp<-df_spatialpoint |> 
  as("ppp") #converting the spatial data frame into a ppp object

df_ppp_owin<-df_ppp[NGA_owin] #creating an owin object

plot(df_ppp_owin, main = "df_ppp", which.marks = "status_redefined") #creating a quick plot to visualise the ppp object

4.3.2 Using Cross K function to check for distribution trend

We use the cross-K function to analyse the trend of distribution of both functional and non-functional waterpoints.

Lcross.csr <- envelope(df_ppp_owin, 
                                 Lcross, 
                                 i="Functional", 
                                 j="Non-functional", 
                                 correction="border", 
                                 nsim=39)
Generating 39 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,  39.

Done.

We can then plot our result

plot(Lcross.csr, 
     xlim = c(0,10000))

4.3.3 Statistical conclusions

From the graph above, we can conclude that there is evidence of spatial dependence between 0 to 5000, and 7000 to 8000 r values. More specifically, between 0 to 5000, functional and non-functional waterpoints tend to cluster, while between 7000 to 8000, they tend to be evenly distributed.

5 Self-sourced: Comparing population and waterpoint functionality

In this section we aim to come up with useful conclusions regarding the waterpoint trends observed. Specifically, we compare the distribution of clustering of functional and non functional waterpoints with the population spread of Osun, to see if there is any correlation between both factors.

5.1 Importing data

In researching for population data for Osun state, we chose to use population density from this website.

We save the webpage as a html file, and open it using microsoft excel, to generate the table in xls format. Subsequently, we save the excel file into csv format for analysis in R. The file will be saved with the name “pop_data_nga.csv”.

pop_data<-read.csv("data/pop_data_nga.csv")

osun_pop <- pop_data %>% 
  rename(shapeName = `Local.gov..area.`, 
         HASC = `HASC....`,
         Capital = `Capital.....`, 
         Population = `Population....`,
         State = `State....`) |> 
  filter(State == "Osun")

To plot the population data, we use ADM2 data which defines the specific geoboundaries of LGAs in states.

#Joining data from both data frames, preserving sf  properties
geoNGA2_osun<-bd |> 
  left_join(osun_pop, by=c("ADM2_EN"="shapeName"))

5.2 Plotting waterpoint data and population data

With our data ready, we can plot the projected population using tmap functions.

tm_shape(geoNGA2_osun)+
  tm_fill("Density",
          style = "quantile", 
          palette = "-Blues",
          title = "Population Density of Osun LGAs")+
  tm_layout(main.title = "Population Density of Osun LGAs",
            main.title.position = "center",
            main.title.size = 1.2,
            legend.height = 0.45, 
            legend.width = 0.35,
            frame = TRUE) +
  tm_borders(alpha = 0.5) +
  tm_scale_bar() +
  tm_grid(alpha =0.2)+
  tm_shape(filter(df,status_redefined=="Functional"))+
  tm_dots(col="green",
          size=0.01,
          border.col="black",
           border.lwd=0.5,
          alpha=0.5)+
  tm_shape(filter(df,status_redefined=="Non-functional"))+
  tm_dots(col="red",
          size=0.01,
          border.col="black",
           border.lwd=0.5,
          alpha=0.5)

Change tmap_mode to plot

tmap_mode("plot")

5.3 Conclusions

From the plot, we see that high population density areas such as Ife Central, Ede North, Boripe, Ilesha West and Orolu have high coincidence of non-functional waterpoints. This could possibly imply the overuse of waterpoints by the larger population. More importantly, this finding can be used by government agencies who are pioritising repair schedules for the non-functional waterpoints.